Introduction to Computational Biology Lecture # 27: Microarrays and clustering
نویسنده
چکیده
We will use a variation of UPGMA in UPGMA we have a distance matrix, in each iteration we take the pair i, j with the minimal distance: ∀k, l : di,j < dk,l and create a new node which is their parent.In the UPGMA algorithm the new distances for this node is the average of the two distances of it’s children : ∀l : Di+j,l = 12 (di,l + dj,l). We will talk about a very similar idea for clustering genes which is known as Eisen clustering: if we measure the expression of the genes in m different conditions,each gene is represented by a vector in R. if we are joining two genes i, j then we create a new “gene” (i+ j) with a new vector which is defined: ∀a : Xi+j,a = 12 (Xi,a +Xj,a). If we use the Euclidean distance then this is the same as normal UPGMA, but instead, in this method we use correlation a measure to how related are two vectors. An intuition for correlation is this:if we have two vectors X,Y ∈ R and we want to represent Y as a linear combination of X (i.e.: Yi = aXi + b) then a is the correlation between X and Y and the larger a is means X and Y are more correlated. Un-Normalized correlation:
منابع مشابه
Introduction to Computational Biology Lecture # 8: Microarrays
Microarray is a technology developed for gene expression analysis. It takes advantage of the base-pairing property which is the making of hydrogen bindings of single stranded DNA with its reverse complement. As a double stranded molecule, DNA reaches its most stable condition. The idea is to ’fetch’ a specific sequence by placing it’s reverse complement as ’bate’. We’ll refer the reverse comple...
متن کاملProblem Based Learning or Lecture, A New Method of Teaching Biology to First Year Medical Students: An Experience
Introduction. In the previous studies in the field of medical education, problem based learning and lecture based learning have been compared, but, due to the learning habits of Iranian students and special condition of education, the effects of these two methods have been less investigated in Iranian universities so far. This study attempts to compare the effects of these two methods on studen...
متن کاملClustering Algorithms: On Learning, Validation, Performance, and Applications to Genomics
The development of microarray technology has enabled scientists to measure the expression of thousands of genes simultaneously, resulting in a surge of interest in several disciplines throughout biology and medicine. While data clustering has been used for decades in image processing and pattern recognition, in recent years it has joined this wave of activity as a popular technique to analyze m...
متن کاملIntroduction to Computational Biology Lecture # 17: RNA Structure from Sequence
In the previous lesson we presented a Stochastic Context Free Grammar model for prediction of RNA secondary structure from sequence. This model enables us to assign a probabily for every possible folding of a given RNA sequence. We will start this lecture with some comments about the relation between probabilities and energies. Then we will learn how to calculate the probabilities of structural...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2008